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ABSTRACT 

We identify a spectroscopic sequence of galaxies, analogous to the Hubble sequence of 
morphological types, based on the Automatic Spectroscopic K-means (ASK) classifi- 
cation. Considering galaxy spectra as multidimensional vectors, the majority of the 
spectral classes are distributed along a well defined curve going from the earliest to the 
latest types, suggesting that the optical spectra of normal galaxies can be described in 
terms of a single affinc parameter. Optically-bright active galaxies, however, appear as 
an independent, roughly orthogonal branch that intersects the main sequence exactly 
at the transition between early and late types. 
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1 INTRODUCTION 

It is well known that galaxies can be classified into a 
small number of morphological types, arran ged into a wel l- 
defmed sequence. The scheme proposed by Hubble! (|l926l ). 
still widely in use today, is based on the optical appear- 
ance of galaxy images, and it divides the galaxy population 
into ellipticals, lenticulars, and spirals, with the irregular 
class encompassing all the objects that do not fit into any 
of the other categories. It focuses on the symmetry of the 
galaxy, the concentration of the light towards the centre, 
and the presence of other features such as disks, bars, and 
spiral arms. The Hubble sequence, also known as the Hub- 
ble tuning fork, smoothly connects the different morpholog- 
ical classes. Ellipticals, regular spirals, and barred spirals 
occupy three different arms of the sequence, intersecting at 
the lenticular class. Irregular galaxies are more difficult to 
accommodate, but they are usually placed at the end of the 
spiral branches. 

The Hubble sequence correlates with the colours 
of the galaxies, with ellipticals tending t o be r ed and 
spira ls tending to be blue ()Humasonl Il93ll : iHubbld 
1 19361 : iMorgan fc Mavalll 1 19571 ). T he relationship how- 
ever, presents a lar g e scatter JConnollv et al.l 1 19951 : 
ISodre fc Cuevasl Il997l : iFerraresd l200rj); about half of 
the red galaxies are actual ly disks ( Masters et al.l l20ld : 
ISanchez Almeida et aljfeoill ). and b lue ellipticals are not so 
rare as one may naively think (e.g.. ISchawinski et al.|[200Sl : 
iHuertas-Companv et alj |2010| ) . Many works have tried to 
relate the spectral energy distribution (SED) of a galaxy 
to its position along the Hubble sequence. The results are 
varied, and they are probably affected by the scatter of the 
relationship between morphological type and spectroscopic 



class. IMorgan fc Mavalll (|l957l ) assigned the blue part of the 
visible spectrum to stellar classes from A to K, finding a clear 
relationship in the vein mentioned above. lAaronsonl (|l978l ) 
shows how the visible and IR colours of galaxies along the 
Hubble sequence can be understood as a one-parameter fam- 
ily, in terms of the sup erposition of spe ctra of AOV dwarf 
stars and MOIII giants. iBershadvl (| 19951 ) points out that a 
simple model consisting of two stellar spectral types can re- 
produce the observed broad-band colours, but only if the 
spectral types are allowed to vary, which implies that the 
family is n ot one-dimensional. S imilar conclusions are also 
reached bv lZaritskv et all (| 19951 ) using stellar spectrum fit- 
ting. 

One of the most popular spectral classification methods 
is Principal Component Analysis (PC A). It is fast and ro- 
bust, and it has a sound mathematical foundation (see e.g. 
lEverittJ[l995l ). For a given dataset, PC A finds the smallest 
possible set of orthogonal eigenvectors that reproduce the 
dat a within a certa i n accu racy. In the case of galaxy spec- 
tra, IConnollv etaD l| 19951 ) claim that the first two eigen- 
coefficients suffice to represent most galaxies, and that the 
resulting spectral types can be described in terms of a one- 
parameter family. Based on the PCA decomposition of a 
mu ch larger galaxy s ample, containing more than 10 spec- 
tra, lYip et all (2004) highlight the importance of the third 
eigenvector and summarize the galaxy spectra in terms 
of two independent angular variables. In other words, the 
galaxies are contained within a three-dimensional volume, 
given by the linear combination of the first three eigenvec- 
tors. 

However, it is not obvious whether the galaxy distri- 
bution is indeed three-dimensional, or it is confined to a 
non-linear manifold of lower dimensionality, immersed in 
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the three-dimensional space. Here we study the multidimen- 
sional distribution of galaxy spectra and explore the pos- 
sibility that different galaxy types may be arranged into 
a spectroscopic sequence, analogous to the morphological 
Hubble tuning fork. The first problem, of course, is how to 
detect such a sequence, if it existed, in a space with as many 
dimensions as data points in the spectrum. Then, if galax- 
ies did indeed form a well-defined spectroscopic sequence, 
it would be extremely interesting to quantify its multidi- 
mensional structure. Would all galaxies be arranged along 
a single curve? along several branches? along a hyperplane 
(or a higher-dimensionality subspace)? 

In principle, the dimensionality of the sequence is re- 
lated to the number of parameters that are necessary in 
order to fully describe a galaxy spectrum. If all their ob- 
servable properties depended on only one single degree of 
freedom, galaxies would describe a one-dimensional curve in 
spectral space, no matter how complicated. For two param- 
eters, the galaxy population would define a 'fundamental 
hypersurface' (not necessarily a plane), and so on. It must 
be noted, though, that these subspaces may or may not be 
fully occupied. Galaxies could be arranged in several discon- 
nected clumps, more or less randomly distributed in spectral 
space, or be confined to a certain region defined by some set 
of inequalities. 

In practice, finding, let alone characterizing, a non- 
linear two-dimensional hypersurface is by no means a triv- 
ial task, and e ven more so for structures in many dimen- 
sions (see e.g. lAscasibar fc Binnevl 120051 ; lAscasibarl |200S| . 



20irJl. However , the e xistence of several relations, like the 
Tullv fc Fisherl 1 1977h relation for spiral galaxies, or the 



Faber fc Jackson (1 19761) relation and the fundamental plane 
( Diorgovski fc Davisll987l ) 'or elliptical galaxies, provide en 



couraging evidence that galaxies can b e described in terms 
of very few indepe ndent parameters (|Disnev et al.l 120081; 
iTollerud et al.l l201Ql. As mentioned above, previous studies 
based on principal component analysis have concluded that 
only the first two or three linear eigencoefficients are neces- 
sary in order to reproduce the main features of the galaxy 
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The present work follows a different approach, based on 
the publicly-availabl^B Automatic Spectroscopic K-means- 
based (ASK) classification of all the galaxy spectra in 
the seventh data release of the Sloan Digital Sky Surve y 
(SDSS/DR7, IStoughton et al.ll2002l ; lAbazaiian et al.ll2009h . 
A thorou gh description of this classifica tion scheme is pro- 
vided in ISanchez Almeida et al.l (|2010T ). where the reader 
is referred to for further details, but the main aspects are 
summarized in Section [2] for the sake of comprehensiveness. 
We investigate whether galaxies form a continuous, single- 
parameter sequence by studying the minimal spanning tree 
(MST) of the template vectors defining the ASK classes. The 
details of the computation of the minimal spanning tree are 
given in Section [3] and Section U is devoted to the identifi- 
cation of a possible spectroscopic sequence. A quantitative 
characterization and its physical interpretation are discussed 
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in Section [5] and our main conclusions are then succinctly 
summarized in Section [6] 



2 THE ASK CLASSIFICATION 

The ASK classes are the result of classifying all the galaxies 
with spectra in the SDSS/DR7. Those with redshift smaller 
than 0.25 are transformed to a common rest-frame wave- 
length scale, and then re-normalized to the integrated flux 
in the SDSS g-filter. These two are the only manipula- 
tions the spectra undergo before classification. The classi- 
fication is driven only by the shape of the spectra, and these 
two corrections remove the obvious undesired dependen- 
cies on the redshift and appa rent magnitude of the galaxy. 
ISanchez Almeida et al.l (|2010h deliberately avoid correcting 
for other known effects requiring modelling and assump- 
tions (e.g., dust extinction, seeing, or aperture effects), in 
th e spirit o f the r ules for a good classification put forward 
bv lSandage!(|2005h . where it is pointed out that physics must 
not drive a classification. Otherwise, the arguments become 
circular when the classification is used to infer the underly- 
ing physics. 

The classification algorithm used, k-means, is a well- 
known, robust workhorse, commonly employed in data min- 
ing, machine learning, an d artificial intelligence (see e.g. 
Everitt 1995: Bishop 2006). Its computational efficiency was 
an important asset in order to carry out the simultane- 
ous classification of the full data set (~ 12 GB). The al- 
gorithm works by iteratively assigning each galaxy to the 
nearest class in spectral space and re-evaluating the class 
template spectrum as the average over all the class mem- 
bers. In the end, 99% of the galaxies can be assigned to 
only 17 major classes, with 11 additional minor classes 
describing the remaining one percent. The actual number 
of classes has some uncertainty, although it is automati- 
cally provided by the algorithm, which always renders be- 
tween 15 and 19 major classes. The template spectra vary 
smoothly and continuously, and they are labelled from 
to 27 according to their (it — g) colour, from reddest to 
bluest. It is unclear whether the ASK classes represent gen- 
uine clusters in the 1637-dimensional classification space, or 
th ey partake a continuous dist ribution (see the discussion 
in ISanchez Almeida et alj |2010L as well as Section [5} . The 
class templates cover all the possible spectral shapes, and 
we use them in our search for a sequence. Since we are not 
interested in the occupation distribution along such a se- 
quence, all templates are treated equally, even though each 
class contains a different number of SDSS/DR7 galaxies. 



3 MINIMAL SPANNING TREE 

The MST of a graph (e.g. lKruskal|[l956l ) is the set of edges 
that connect all the vertices in the graph at a minimum cost, 
defined as the sum of the individual costs of all the edges 
included in the tree. In our case, these individual costs are 
given by the differences between the template spectra, and 
the MST can be thought of as the shortest possible 'road 
network' connecting all spectral classes. 

Although real life is a little bit more complicated (see 
below), one may expect that, if galaxies, and thus classes, 
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Figure 1. Minimal spanning tree of the ASK classes, based on the Euclidean (top) and Manhattan (bottom) distance between template 
spectra, considering, from left to right, the full wavelength range, only the red part (A > 6000 A), the blue part (A < 6000 A), and the 
same bands used to define the ASK classes. 



were roughly arranged into a single curved line, the MST 
would be ideally suited to identify such a multidimensional 
sequence. For example, if A, B, C, D, and E are different 
types of galaxy, forming the sequence A-B-C-D-E, this would 
be their minimal spanning tree, and it can be proven that 
any other combination of edges would result in a longer total 
distance. In this example, the extreme classes A and E would 
have just one connection, whereas the intermediate types B, 
C, and D would have two. Moreover, the distance between 
next-to-consecutive classes (e.g. A and C) must be larger 
than both AB and BC. If A, B, and C are aligned along a 
straight line, the equality AC = AB + BC will hold, whereas 
for a curved line AC < AB + BC. 

If the galaxy distribution had more than one dimension, 
some classes would become 'tree nodes' featuring three or 
more connections. If the subspace defined by the galaxies is 
fully occupied, there will be a large number of nodes, and 
it would be difficult to obtain much information about its 
structure from the MST alone. On the other hand, a small 
number of nodes would imply that galaxies are arranged into 
a few discrete 'branches' with different orientations. 

The minimal spanning tree of ASK classes is shown in 
Figure \T\ using eight slightly different definitions of the dis- 
tance in spectral space. In the top panels, we assume the 
Euclidean metric 

^A 

d 2 AB = ^[C S (A i )-CA(A»)] 2 (1) 

i=l 

whereas the Manhattan distance 

^A 

d AB =^|C s (A«)-CU(A i )| (2) 

i=l 



has been used in the bottom panels. In both cases, cLab 
denotes the distance between classes A and B, and Cx(A) is 
the spectral template of class X , evaluated at a wavelength 
A. We investigate different criteria to select the set of Nx 
discrete wavelengths involved in the computation in order 
to test the stability of the results: the whole spectral energy 
distribution between 3800 and 9250 A (Nx = 3850), only 
the bluest part (A < 6000 A, N x = 1977), the red part 
(A > 6000 A, N x = 1873), and the same 17 bands that were 
us ed to define the ASK classifica tion (Nx = 1637, see Table 1 
in lSanchez Almeida et ai1l2010l . for the precise definition of 
the bandpasses). Note that, in each case, the dimensionality 
of the data space is given by the value of Nx. 

The fourth option - using the spectral range used to 
define the ASK classification - is arguably the most nat- 
ural. Using the spectral range to the red end, where the 
mean spectra of each class have been extrapolated, or the 
blue end, which is often dominated by the presence of strong 
emission lines, seem to be poor choices that probably intro- 
duce some noise in the resulting MST. Also, the Euclidean 
metric, where differences add in quadrature, seems to be 
more appropriate for comparing two spectra than the Man- 
hattan distance. We simply use all these different definitions 
in order to test the stability of our results. Although the so- 
lution is far from being unique (the exact ordering of the 
classes depends on the adopted definition), the overall pic- 
ture is fairly robust: according to the MST, the ASK classes 
representing the galaxy population in the SDSS seem to be 
distributed along three main spectroscopic branches, or, al- 
ternatively, along a main spectroscopic sequence with one 
ramification. 

The longest branch, both in terms of the number of 
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Figure 2. Template spectra of several ASK classes representative 
of the early- type (top panel), late- type (middle panels), and active 
(bottom panel) branches. Cx denotes the spectral template of 
class X. 



classes and the extent measured by the Euclidean distance 
(i.e. differences in the template spectra) is composed of ASK 
types 15, 17, 20, 21, 25, 27, 26, 24, 23, 22, 18, and 19. The po- 
sition of these classes on the principal components plane, a s 
well as on the colour-colour and BPT (|Baldwin et al.lll98ll ) 
diagrams (see Figures [7] and [Sj suggests that this branch 
corresponds to the sequence of dwarf irregular galaxies. It 
merges smoothly with the location in spectral space occu- 
pied by normal spirals, represented by ASK classes 16, 14, 
13, 12, and 9. Early-type galaxies (classes 0, 2, 3, and 5) are 
also grouped together in another branch for all measures of 
the distance, and the same can be said of the active galaxy 
types 8, 7, 6, 11, and 10. All three branches (early-type, late- 



type, and active) seem to converge around classes 9, 10, or 
12, depending on the definition used. Classes 1 and 4 seem to 
be outsiders in the main red sequence, and we are currently 
investigating the possibilit y that they are associated to h eav- 
ily dust-reddened spirals |Sanchez Almeida et alj|201 ll ). A 
sample of spectra that are representative of each branch are 
plotted in Figure [2] 



4 A SPECTROSCOPIC SEQUENCE? 

We argue that the three independent branches we have iden- 
tified trace an underlying spectroscopic sequence, analogous 
to the Hubble tuning fork of galaxy morphologies, and the 
subtle differences between the MST obtained for different 
definitions of the distance are due to the presence of random 
deviations of the individual galaxy spectra with respect to 
the average behaviour of the sequence. 

In other words, our branches are not ideally thin hy- 
perlines in the data space, but 'hypertubes' with a certain, 
variable thickness, where the contributions of intrinsic phys- 
ical dispersion of the galaxy properties as well as extrinsic 
observational errors add in quadrature. Given the large num- 
ber of objects involved in most ASK classes, measurement 
errors have a negligible effect on the mean spectrum, but 
they make a significant contribution to the dispersion of the 
individual galaxies around the mean (although the error in 
the mean spectrum decreases as the square root of the num- 
ber of observations, the actual dispersion of the distribution 
is independent of the number of galaxies). 

Due to the finite thickness of the branches, the ASK 
classes derived from the k-means algorithm will not be 
aligned along the centres of these hypertubes, but they will 
alternate along their boundaries. This can be easily illus- 
trated by a simple experiment, where we set up a random 
distribution of data points that corresponds to a hypertube 
in two dimensions. The first coordinate varies uniformly 
from 0.1 to 0.9, and the second follows a Gaussian distri- 
bution centered at 0.5 with a standard deviation of 0.1. The 
data points and the centres of the final classes returned by 
the k-means algorithm are plotted in Figure [3] as dots and 
open boxes, respectively. This configuration, where classes 
(i.e. template spectra) alternate between the boundaries of 
the distribution rather than tracing the centre, will occur 
whenever the dispersion around the mean is comparable 
to the typical inter-class distance. During the first itera- 
tion, the classes are initialized at the locations of randomly- 
picked data points. Then, each class collects the points in its 
Voronoi cell, and its position is updated to the new centre of 
mass. The process is repeated until the classes arrange them- 
selves into the pattern shown in Figure O which roughly 
corresponds to the most efficient packing in two dimensions. 

In the general case, the MST will be able to pinpoint a 
sequence with a finite thickness, but it will zigzag through 
the distribution rather than crossing it along a more or less 
straight line. More precisely, the angle between the direc- 
tions of two consecutive edges (say, AB and BC) may not 
necessarily be small, and the distance AC may be compa- 
rable to both AB and BC. On the contrary, the segments 
AC, CE, EG, etc. - which do not belong to the MST - trace 
the boundary of the distribution and, unless there is a sharp 



Do galaxies form a spectroscopic sequence ? 5 




57 



Figure 4. Distances (blue) and angles (black) between all the edges that are present in any of the four MST depicted on the top panel 
in Figure [T] Solid lines show all the connections appearing at least once, for any definition of the distance, but numeric values have been 
computed using the same bands as in the definition of the ASK classes. The early, late, and active branches have been highlighted in 
different shades, and dotted lines have been added to guide the eye when reading the angles. 



where 



was (AO = C B (Xi)-C A (\i) 



(4) 



o.o I 

0.0 



0.4 0.6 
1st Cooi'diimtc 



Figure 3. Results of the k-means algorithm for a random dis- 
tribution of points in two dimensions (see text). Class centres are 
shown by the open boxes. 



turn, the angles between them will be typically smaller than 
the angles between consecutive edges in the MST. 

We therefore investigated all the distances and angles 
between consecutive vertices in any of the four variants of 
the MST defined with the Euclidean metric distance shown 
in the previous section. A schematic representation is plotted 
in Figure U (see also Appendix [S) . Being a projection of a 
non-linear multidimensional structure, it is necessarily not 
to scale, but we have tried to reproduce actual distances 
and angles as faithfully as possible without sacrifying clarity. 
Precise values of the distances and angles, based on the ASK 
bands (last MST in Figure [1} are indicated by the small 
numbers in blue and black colour, respectively. Distances are 
given by equation {]]), while the angle between two vectors 
is computed from the usual Euclidean scalar product 



is the vector connecting the spectral templates of classes 
A and B, and a similar definition holds for vac- Although 
the precise values of the distances depend, of course, on the 
adopted wavebands, neither the relations between them nor 
the angle between two edges are very sensitive to that choice. 

In order to illustrate the figure, let us take a simple 
node, for example number 21. There is one edge towards 
node 20 because these two classes are connected in the first, 
third, and fourth panels in Figure [TJ and there are two ad- 
ditional edges towards 15 and 27 because of the connections 
in the second panel. The distances from class 21 to classes 
15, 20, and 27 are marked in blue as 81.7, 36.5, and 48.5, 
respectively. The angles between edges 21 — 27 and 21 — 25 
(34 degree), 21 - 25 and 21 - 20 (57 degree), 21 - 20 and 
21 - 15 (99 degree), and 21 - 15 and 21 - 27 (164 degree) 
are indicated by the black numbers. 

Our results are consistent with the pattern described 
above, where ASK classes would be arranged along the 
boundaries of three independent branches with a thickness 
of about two classes. These branches are not straight lines, 
nor do they lie in the same hyperplane, but they represent 
clearly defined sequences in spectral space. The angles be- 
tween the edges that trace the boundaries are close to 180 
degrees, indicating that the sequences describe a relatively 
smooth curve, while the interior angles are close to 60 de- 
grees, implying the classes are arranged in roughly equilat- 
eral triangles within the sequence. 
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5 DISCUSSION 

Our main result is the identification of a spectroscopic se- 
quence with three separate branches, which represent - at 
least in a qualitative sense - early-type, late-type, and active 
galaxies. However, several questions remain open: How many 
parameters are necessary in order to fully specify the spec- 
tral properties of a galaxy? How could one compute their 
value, and what is their physical meaning? What do they 
tell us about galaxy formation and evolution? 

Concerning the first question, the configuration of the 
ASK classes in spectral space suggests that the optical spec- 
tra in the SDSS/DR7 have only two degrees of freedom. 
An interpretation in terms of a single parameter would be 
that the putative one-dimensional curve in spectral space 
marches through the early-type galaxies, climbs up and 
down the active branch, and then moves on towards the 
late types. Alternatively, and arguably more likely, it would 
also be possible that there is a main sequence going from 
early- to late-type galaxies, but some of them (e specially 
those in the green valley; see e.g. ISalim et al.ll2007l ) may be 
temporarily found in an active state that takes them out of 
the main sequence. In this scenario, the optical spectrum 
of a galaxy can be accurately described in terms of one dis- 
crete parameter (whether the galaxy belongs to the 'normal' 
or the 'active' branches) and one real number characterizing 
its position along the corresponding sequence. 

This interpretation is reinforced when one tries to find 
a bidimensional projection that captures the main features 
of the spectral classification. After some experimentation, 
we have selected the hyperplane defined by ASK classes 0, 
5, and 8, corresponding to a typical early-type galaxy, an 
object in the green valley, and an extremely active galaxy, 
respectively. Looking at the distances and angles between 
the classes, one can easily verify that classes 5, 6, 7, and 8 
are roughly aligned along a more or less straight line, at an 
angle of 102 degrees (almost perpendicular) with respect to 
the segment connecting classes 5 and 0. We have thus se- 
lected the template of the green-valley class 5 as the origin 
of coordinates. The axis towards class 8 provides a measure 
of galaxy activity, and we have set the normalization so that 
this class represents the unit value. For a galaxy with spec- 
tral energy distribution S(X), the activity A is defined as 
the scalar product 



A = J2[S(Xi)-C 5 (\i)}AA(Xi) 



i=i 

with the vector 



A A (Ai) 



C 8 (A0-C 5 (Ai) 
E7 = \ [Cs(Xj) - C 5 (A,)] 2 



(5) 



(6) 



On the other hand, the axis defined by class can be used 
to quantify the spectral type of the galaxy in a similar way: 



T = [S(Xi) - C 6 (Aj)] At (A,) 



(7) 



where 
At (A, ) 



[Co(Ai) - C 5 (Xi)} 
Efi 1 [go(^)-g5(A J )]A A (A J ) 
Ef=\Ai(A i ) 



A A (Ai) (8) 



<T 




A [A] 

Figure 5. Basis for the galaxy type-activity decomposition. The 
coordinate origin, set by the template spectrum of ASK class 5, 
is shown on the top panel. Middle and bottom panels show the 
basis vectors At and A a determining type and activity, respec- 
tively. The blue bands indicate the ASK bandpasses, also used to 
compute the values of type and activity plotted in Figure [6] 
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6.045e-03 


-2.552e-05 





3804.39 


6.226e-01 


6.356e-03 


-3.207c-05 





3805.27 


6.319e-01 


6.480e-03 


-3.243e-05 





3806.15 


6.324e-01 


6.466e-03 


-3.256e-05 





3807.03 


6.310e-01 


6.484e-03 


-2.825e-05 






Table 1. Template spectrum of ASK class 5 and basis vectors At 
and A a as a function of wavelength. Last column is a binary flag 
indicating whether that particular wavelength is included in the 
bands used to define the ASK classes. The full table is available 
in the electronic version. 



is orthogonal to A a, and it is normalized so that the type 
of class is equal to —1, i.e., we chose the scaling constant 
in equation (j8j) so that 



J2 t c °( A * 



C 5 (Xi)} At (A; 



(0) 



The spectral template of class 5, together with the ba- 
sis vectors A a and At, are represented in Figure [5] and the 
location of ASK classes in the galaxy type-activity plane is 
plotted in Figure [S] The shape of the spectroscopic sequence 
is evident in this representation, which has the advantage 
that its two parameters are easy to evaluate for any galaxy 
(numeric values for Cs, Aa, and At are provided in Table[T} 
and have a pretty clear physical meaning: the spectral type 
quantifies the evolutionary state of the galaxy, while the ac- 
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tivity component reflects the presence of warm ionized gas. 
An interesting feature that was not obvious from the study 
of the angles between edges in the MST is that the end of the 
late-type branch turns into a direction that is roughly paral- 
lel (more precisely, forms an angle of 20 degree) in spectral 
space to the axis defined by the active galaxies. From Fig- 
ure [5] we immediately see that this is due to the presence of 
strong emission lines that make a significant contribution to 
the overall luminosity, but both branches can be easily iden- 
tified with the well-known AGN/starburst dichotomy in the 
BPT diagram. 

The projection of our spectroscopic sequence onto the 
BPT diagram, as well as the colour-colour plane defined by 
(u — g) and (g — r), is shown in Figure [7] The early and 
late branches correspond to the red sequence and the blue 
cloud in the colour-colour plot, respectively, and all the ac- 
tive galaxies are located approximately at the same place, in 
the region of the green valley. On the other hand, AGN can 
be neatly separated from the late-type galaxy population in 
the BPT diagram, but in this case the early-type branch 
is absent because the intensity of the H/3 line is too faint 
(in fact, it is observed in absorption for ASK classes 0, 2, 
and 3). The projection onto the type-activity plane has the 
advantage that the full structure of the sequence, with its 
three branches, can be represented simultaneously. 

Another interesting space for projection is the volume 
defined by the first three e igenvector jj res ulting from a prin- 
cipal component analysis (|Yip et alj|2004 ) . As in the case of 
the colour-colour and BPT diagrams, the one-dimensional 
nature of the galaxy distribution is also encoded in the con- 
figuration of the ASK classes within the three-dimensional 
volume defined by the first PCA coefficients (01,02,03), as 
well as in the projections of the spectroscopic sequence onto 
the three orthogonal planes (01,(12), (01,03), and (02,03) 
shown in Figure [8] Note that the projection of the curve in 
the first two eigenvalues (01,02) is qualitatively similar to 
the location of the classes in the type-activity plane (Fig- 
ure [6l ). This fact conn ects our sequence with the finding by 
IConnollv et al.l (| 19951 ) that most galaxy spectra form a one- 
parameter family defined by the ratio between the first two 
eigenvalues. The original PCA-based sequence does not in- 
clude the active branch we have identified , although it can 
be guessed in Figure 4 of lYip et all (|2004 ). 
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Figure 7. The spectroscopic sequence in the colour-colour (top) 
and BPT (bottom) diagrams. 

It is convenient to emphasize that the parameters A and 
T are not meant to describe the galaxy spectra by them- 
selves. More precisely, 



5(Ai) ^ C 5 (Xi) + AA A (Xi) + TA T (\i) 



(10) 



in contrast to the PCA decomposition. Their purpose is 
merely to provide a quantitative, continuous, linear map- 
ping between the SED of a given galaxy and our spectro- 
scopic sequence. Although a detailed discussion is beyond 
the scope of the present work, the position of a given galaxy 
in the type-activity plane may be used for assigning it to 
the appropriate branch, as well as determining its position 
along the corresponding one-dimensional curve. The galaxy 
spectrum could then be recovered by interpolating the tem- 
plates of the nearest ASK classes, but the procedure is less 
straightforward than PCA. 

Our results are consistent with other approaches to 
galaxy classification, but they highlight the remarkably few 
degrees of freedom that are necessary in order to character- 
ize optical spectra. Out of the several thousand dimensions 
of the data space, the vast majority of SDSS galaxies are 
confined to just two one-dimensional curves, contained in a 
three-dimensional Euclidean volume. 

While it is well known that the structure of dark matter 
haloes can be described in terms of one or two free param- 
eters (see e.g. lAscasibar fc Gottloberl [20081 , and references 
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optically-selected AGN are associated with one particular 
stage of galactic evolution, in agreem ent with earlier re- 
sults (e.g. ISchawinski et aLll2007l . 2010T 1 and consistent with 
a scenario where quasar activity marks the termination of 
star for mation and the tra nsition from late to early type 
(see e.g. iHickox et al.1120091 and references therein). If this 
is true, the type-activity diagram - and the spectroscopic 
sequence - will probably have a similar form at high red- 
shift, although each branch would be located in a different 
region of the spectral space and populated by galaxies of 
very different physical properties. 
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Figure 8. Top: Distribution of the ASK classes in th e space of 
the fir st three eigenvalues of the PCA decomposition bv lYip et al.l 
l|2004h . Note how the classes follow a one-dimensional curve in 
this three-dimensional space, with a diverticulum corresponding 
to the active branch. Bottom: Projection of the curve in the three 
orthogonal planes. 



therein), related to the statistical propertie s of the primor- 
dial pertur bations of the density field (e.g. lAscasibar et al.l 
12004120071 ). it is somewhat surprising that galaxy forma- 
tion, with all the complex physical processes involved, does 
not seem to introduce additional degrees of freedom. In our 
opinion, understanding why the optical SED of a galaxy con- 
tains so little information is an important piece of the puzzle 
of galaxy formation and evolution, and it poses a very strong 
constraint on the number of free parameters that are avail- 
able to theoretical models. 

Finally, let us note that, like the Hubble morphologi- 
cal sequence, the spectroscopic sequence presented here pro- 
vides a snapshot of the distribution of galaxies in spectral 
space at the present day, but it does not imply an evolu- 
tionary track, nor does it contain any temporal information 
whatsoever. Galaxies of type 9 are similar to galaxies of type 
10, and these, in turn, have spectra that are close to those 
of galaxies of type 11, but this does not mean that they turn 
into one another. 

Nevertheless, our main sequence can be interpreted in 
terms of the average age of the stellar population, with 
younger galaxies corresponding to larger values of the spec- 
tral type. The almost perpendicular location of the active 
branch with respect to the main sequence suggests that 



6 CONCLUSIONS 

In this work, we have investigated the distribution of galax- 
ies in spectral space. More precisely, we have computed the 
minimal spanning tree of the class templates in the Auto- 
matic Spectroscopic K-means-based (ASK) classification of 
SDSS/DR7 data. By studying the distances and angles in 
spectral space between different galaxy types, it is found 
that galaxies in the local universe are distributed along 
a spectroscopic sequence with three independent branches. 
These branches contain the spectra of early-type, late-type, 
and active galaxies, and they intersect in the spectral region 
corresponding to the 'green valley'. 

This configuration contains two degrees of freedom: one 
discrete parameter that determines whether a galaxy be- 
longs to the 'normal' (either early- or late-type) part of the 
sequence or to the 'active' branch, as well as one continuous 
affine parameter that describes the position of the galaxy 
along its one-dimensional branch. We interpret the normal 
branches as a main galactic sequence, described in terms 
of one single affine parameter indicative of the evolution- 
ary state of the object (a combination of mean stellar age, 
gas abundance, and chemical composition). At some point 
in the evolution of a galaxy, star formation stops, and the 
object moves from the late-type to the early-type branch. 
During that phase, some galaxies can also be found in the 
active branch. In agreement with previous work, we find that 
optical AGN activity is exclusively associated with that par- 
ticular period. 

We have verified that our results are robust with respect 
to different definitions of the distance between galaxy spec- 
tra. A straightforward prescription and a set of basis vectors 
are provided in order to quickly evaluate the spectral type 
and activity of a given galaxy. In the future, we would like 
to study the relation between spectral type and optical mor- 
phology for SDSS galaxies and to apply the same technique 
to a sample of objects at higher redshifts. 
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APPENDIX A: DISTANCES AND ANGLES 
BETWEEN ASK CLASSES 

Given the large amount of data plotted in Figure [3] here we 
represent the early, late, and active branches of our spectro- 
scopic sequence in Figures [AlllA2l and lA3l respectively. For 
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Figure A2. Distances (left) and angles (right) between the ASK 
classes of the late branch. 




Figure A3. Distances (left) and angles (right) between the ASK 
classes of the active branch. 



the sake of clarity, distances and angles between the different 
ASK classes are shown in separate panels. 



